Three Dissimilarity Measures to Contrast Dendrograms

نویسندگان

  • Guillermo Restrepo
  • Héber Mesa
  • Eugenio-José Llanos
چکیده

We discussed three dissimilarity measures between dendrograms defined over the same set, they are triples, partition, and cluster indices. All of them decompose the dendrograms into subsets. In the case of triples and partition indices, these subsets correspond to binary partitions containing some clusters, while in the cluster index, a novel dissimilarity method introduced in this paper, the subsets are exclusively clusters. In chemical applications, the dendrograms gather clusters that contain similarity information of the data set under study. Thereby, the cluster index is the most suitable dissimilarity measure between dendrograms resulting from chemical investigation. An application example of the three measures is shown to remark upon the advantages of the cluster index over the other two methods in similarity studies. Finally, the cluster index is used to measure the differences between five dendrograms obtained when applying five common hierarchical clustering algorithms on a database of 1000 molecules.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CAFE: aCcelerated Alignment-FrEe sequence analysis

Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software,...

متن کامل

Additive Evolutionary Treest . And

Metric trees are dendrograms which show the phylogenetic relationships for a set of contemporary species. These dendrograms have numerical values attached to the branches. If the sum of these values on the branches between any two contemporary species is equal to the dissimilarity between these two species, the metric tree is said to be additive and possess an additive dissimilarity matrix. Met...

متن کامل

Assessing Dissimilarity Measures for Sample-Based Hierarchical Clustering of RNA Sequencing Data Using Plasmode Datasets

Sample- and gene-based hierarchical cluster analyses have been widely adopted as tools for exploring gene expression data in high-throughput experiments. Gene expression values (read counts) generated by RNA sequencing technology (RNA-seq) are discrete variables with special statistical properties, such as over-dispersion and right-skewness. Additionally, read counts are subject to technology a...

متن کامل

Sandrine Pavoine1,2*

2 Abstract. Ecological studies have now gone beyond measures of species turnover towards measures of phylogenetic and functional dissimilarity. This change of perspective has a main objective: disentangling the processes that drive species distributions from local to broad scales. A fundamental difference between phylogenetic and functional analyses is that phylogeny is intrinsically dependent ...

متن کامل

Combining dissimilarity measures for prototype-based classification

Prototype-based classification, identifying representatives of the data and suitable measures of dissimilarity, has been used successfully for tasks where interpretability of the classification is key. In many practical problems, one object is represented by a collection of different subsets of features, that might require different dissimilarity measures. In this paper we present a technique f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 47 3  شماره 

صفحات  -

تاریخ انتشار 2007